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Abstract 

Data transfer is one of the main functions of the Internet. The Internet consists of a large number of interconnected sub¬ 
networks or domains, known as Autonomous Systems (ASes). Due to privacy and other reasons the information about what 
route to use to reach devices within other ASes is not readily available to any given AS. The Border Gateway Protocol (BGP) 
is responsible for discovering and distributing this reachability information to all ASes. Since the topology of the Internet is 
highly dynamic, all ASes constantly exchange and update this reachability information in small chunks, known as routing con¬ 
trol packets or BGP updates. In the view of the quick growth of the Internet there are significant concerns with the scalability 
of the BGP updates and the efficiency of the BGP routing in general. Motivated by these issues we conduct a systematic time 
series analysis of BGP update rates. We find that BGP update time series are extremely volatile, exhibit long-term correlations 
and memory effects, similar to seismic time series, or temperature and stock market price fluctuations. The presented statisti¬ 
cal characterization of BGP update dynamics could serve as a basis for validation of existing and developing better models of 
Internet interdomain routing. 


I. INTRODUCTION 

On large scale, the Internet is a global system of approximately 40, 000 interlinked computer networks connecting billions 
of users and devices worldwide m. These networks are called Autonomous Systems (ASes). ASes vary in size and function: 
they can be (i) Internet Service and/or Transit Providers (AT&T), (ii) Content Providers (Google), (iii) Enterprises (Harvard 
University), and (iv) Non-profit organizations (2). Devices inside ASes are identified via unique Internet Protocol (IP) addresses, 
which are 32- or 128-bit numerical labels that act both as identifiers and locators of devices. An IP address is divided into two 
sections, a network section and a host section. The network section, which is known as IP prefix, identifies a group of hosts, 
while the host section identifies a particular device. An AS can include a number of IP prefixes. 

Each AS is administrated by a single entity, but a single organization may own and operate several ASes. ASes connect 
to each other via contractual agreements that govern the flow of data between and through them. This interconnection of ASes 
shapes the AS-level topology of the Internet, which facilitates connectivity between any pair of ASes and thus any pair of devices 
connected to the Internet (Fig. la). 

The information about how to reach devices within other ASes is not readily available to them. The exchange of this in¬ 
formation is handled by specialized networked computers called routers. Performing routing requires signaling reachability 
information, comparing different possibilities, and maintaining a state that describes how to reach different IP prefixes. The 
Border Gateway Protocol (BGP) m is the globally deployed routing protocol that accomplishes this task. The BGP protocol 
can be summarized as follows. ASes advertise their IP prefixes to their neighbor ASes through BGP update messages. At each 
AS incoming BGP updates are processed by the BGP router and the resulting reachability information is then stored in routing 
tables. 

The Internet is a dynamic system where participating networks and links between them do often experience configuration 
changes, failures, and restorations. BGP protocol reacts to changes in the Internet connectivity incrementally: BGP routers 
send update messages to their neighbor BGP routers. BGP update messages do not carry the information on the whole Internet 
connectivity state. Instead, they carry only the information concerning the affected IP prefixes. Hence, to keep a consistent view 
of the network and, consequently, to be able to communicate with other networks, a BGP router must process incoming BGP 
updates in a timely manner and update its routing table accordingly (Fig. lb, c). 

Current version of the BGP routing protocol was introduced in 1994. Since then, the deployment of the BGP routing protocol 
has sustained tremendous growth and it is arguably one of the main technological reasons behind the success of the Internet. 

Nevertheless, there are two major concerns related to the fast rate of the Internet growth. On one hand, Internet growth 
implies the growth in the number of destinations for the BGP routing and, thus, results in the growth of routing table sizes. 


2 



^ g • 


Router 

User device 



Routing Table 


Destination 

Path 

AS1 

{ASl} 

AS7 

{AS3,AS5,AS7} 


£ BGP updates as a function of time 


10 1 




6 9 12 15 18 21 24 


FIG. 1 : The Internet and BGP routing, a, On large scale the Internet is the product of interconnectivity among a large number of ASes 
(shown with ovals). In order to perform data transfer, ASes need to exchange the reachability information through BGP update messages, b, 
c BGP updates are processed by the BGP routers, b, The reachability information is stored in the routing table, c, Typical dynamics in the 
number of updates received by the BGP router. 


On the other hand, the growth of the Internet also leads to the growth in the number of BGP updates needed to maintain BGP 
routing (4). Both factors are important, especially for routers at the core of the Internet. The growing size of routing tables 
requires increasingly larger and faster memory. At the same time, growing routing table sizes do not necessarily slow down data 
forwarding as long as address lookups are performed using high speed memories and constant-time matching algorithms a. 
Increasingly large amounts of BGP updates, on the other hand, is a more serious concern because processing BGP updates 
can be computationally heavy (updating routing state, generating more updates, checking import/export filters), and can trigger 
wide-scale instabilities 0. 

Recent studies of BGP scalability range from measurements assessing the extent of the concern 13181 to studies suggesting 
radically new routing architectures [2(T0]. Elmokashfi et al. 0 analyzed the dynamics of BGP updates in four networks at the 
backbone of the Internet over a period of seven years and eight months. They have shown that on average the level of BGP 
updates is increasing, but not at an alarming rate: it was shown to grow at rates similar to the growth in the number of ASes. 
However, they have also illustrated that the dynamics of BGP updates is highly volatile even at large time scales, with peak rates 
exceeding the daily averages by several orders of magnitude. 

The complexity of the inter-AS routing system makes it difficult to isolate different factors behind these fluctuations |T 1J12]. 
An approach alternative to inferring this factors directly is to build a realistic model for the dynamics of BGP updates. To this 
end, one needs an in-depth statistical characterization of fluctuations in BGP update time series, which is the subject of this 
work. 

We aim at improving our understanding of these fluctuations, which can help in validating existing models CCD and in de¬ 
veloping better ones. To study the statistical properties of BGP updates, we use historical BGP update logs spanning a period 
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FIG. 2: Time series of BGP updates, a, The number of updates received by the NTT monitor on May 28th, 2010, from 00.00 to 24.00 
GMT. b, the intra-day pattern, Zd(t), and, d, the standard deviation from the intra-day pattern, ad(t), of BGP updates measured for NTT, 
II J, Tinet , and AT&T monitors, c, the intra-week pattern, Z w (t), and, e, the standard deviation from the intra-week pattern, a w (£), of BGP 
updates measured for NTT, II J, Tinet , and AT&T monitors. 


of 8.5 years, collected by the Route Views project |[14] from the BGP routers of four ASes (AT&T, NTT, II J, and Tinet). 
Throughout the manuscript we refer to these routers as monitors. A BGP update log is the time series of BGP updates arriv¬ 
ing at the monitor recorded in 1 second intervals. The four ASes analyzed in this work are among the largest Internet Service 
Providers (ISPs). Therefore, their corresponding BGP update traffic is a reflection of BGP dynamics taking place in the core of 
the Internet, where the BGP update volatility is believed to reach maximum rates. (Detailed information on data collection and 
pre-processing can be found in Appendix [A|. 

To put our study in a broader context we wish to note that many natural and economic systems have also been found to 
exhibit extreme fluctuations. Examples include DNA sequences ita and heartbeat intervals [16], climate variability 031 El, 
earthquakes fl9ll20lL stock markets [ 21-423), and languages 124 . 25l. 


II. RESULTS 

First we highlight the volatility of BGP updates series by reproducing the results of previous works 0. We plot the average 
rate of BGP updates received by the NTT monitor on May 28th, 2010, from 00: 00 to 24: 00 Greenwich Mean Time (GMT). 
As seen from Fig. 2a, in 1 minute interval the NTT monitor receives on the average several hundred updates, while extreme 
fluctuations occasionally produce 10 4 updates per minute. BGP updates are largely driven by two sources: spontaneous BGP 
events and maintenance sessions. The former consist of mostly spontaneous updates, such as misconfigurations, duplicate 
announcements and special events. Maintenance sessions, on the other hand, are periodic by nature and happen at certain times 
of the day on particular days of the week. 

In order to separate the two sources of fluctuations we calculate the intra-day and intra-week patterns for the BGP update time 
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FIG. 3: Extreme events in BGP dynamics.a, The distribution of the number of BGP updates received by the 4 monitors in 1 minute interval, 
All monitors collapse onto a single master curve. Power law regression fit yields a slope of /jl — 2.3 b, The distribution of number of updates 
P{z) received by the NTT monitor calculated for aggregation window sizes At = 1 min, At = 10 min. At = lhour , At = 1 day and 
At — 1 week. 


series. The intra-day pattern, Z& is then defined as the number of events taking place at a specific time of the day, td ay , averaged 
throughout the observation period: 
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where Nd is the total number of days in the observation period, and Z l {td ay ) is the number of events at day i at td ay • The 
intra-week pattern Z w {t wee k ) is defined in a similar way after first normalizing the time series with the intra-day pattern. 
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Here N w is the number of weeks in the observational period and Z l {t wee k) is the normalized number of events at week i at time 
of the week t wee \ c (see Methods for details). 

As seen from Fig. 2b, the intraday BGP update patterns reach maximum values in the interval from approximately 06: 00 to 
10: 00 GMT, which is typical time for scheduling maintenance tasks l26l . The intraweek patterns, in their turn, are characterized 
by higher values during weekdays and smaller values during weekends, (see Fig. 2c). We note that the standard deviations of 
the intraday and intraweek patterns, a^(t) and a w {t), tend to exceed the corresponding average values of the intra-day and the 
intra-week patterns by an order of magnitude, which is consistent with the extreme burstiness of the BGP updates (Fig.2d and 
Fig. 2e). 

To characterize the volatility of the BGP updates we analyze the distribution of the number of BGP updates received by the 
monitor in 1 minute intervals. Figure 3a confirms the volatile nature of BGP updates. We find that all monitors are characterized 
by similar distributions P(Z ). Although the average number of BGP updates received per minute is quite small (Zntt = 250), 
the peak values may occasionally exceed 10 5 BGP updates per minute. The distributions of the number of BGP updates, 
P(Z ), are positively skewed (measured skewness values are: 71 (AT&T) = 45.7, 71 {II J) = 121.2, 71 {Tinet) = 49.1, 
71 (ATT) = 69.1) and the distribution tails scale as a power-law, P{Z) ~ Z~ ^ with fi = 2.51 =b 0.11 (p = 0.992 for IIJ, 
see Appendix | d| for details). We also note that the observed power-law behavior of the tail of P{Z) seems to be independent 
of the aggregation window size (Fig. 3b). The power-law distribution of the number of BGP updates implies that BGP routers 
should be able to cope with surges in the number of updates exceeding the corresponding average levels by several orders of 
magnitude. To understand how and when these surges occur we analyze correlation patterns of the BGP updates. We employ 
three standard methods traditionally used in the time-series analysis: auto-correlation function (ACF), power spectrum (PS), and 
the linear detrended fluctuation analysis (DFA1) (see Methods, Apendix[B] and Ref. l27l for details). 
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FIG. 4: Correlations in the BGP update times series. Fluctuations of the detrended BGP update time series as a function of window size. 


Even though most of BGP update events last less then 1 minute, the duration of some of them may exceed several minutes (28). 
Thus, to avoid possible correlations associated with long BGP updates in our subsequent analysis we use larger aggregation 
window size of A (t) = 10mm. Further, to eliminate possible spurious effects and correlations attributed to periodic activities 
we also normalize the BGP update data with both intra-week and intra-day patterns: 


z(t ) = 


^w (tweek(t)) ^d (j'dayi.t)) 


(4) 


All three methods indicate the presence of long-range correlations in the BGP update time-series (see Fig. 4 and Fig. [6]>. 
Specifically, we find that DFA1 performed for NTT , IIJ and Tinet and AT&T indicates that fluctuations grow as a power-law 
with aggregation window size A, T(A) ~ A a , where a = 0.75 (Fig. 4). To highlight the effects of long-range correlations in 
the BGP updates time series we also performed DFA1 for the randomized counterparts of the BGP updates (see Methods). In 
the randomized case we obtained T ran ^ om (A) ~ A a with a = 0.5, which corresponds to the uncorrelated time series (Fig. 4). 
Similar results are obtained by ACF and PS analysis. The autocorrelation function of the BGP updates decays as a power law 
over several orders of magnitude for all monitors, ACF(Az) ~ z _1 (Fig. [6^). We obtain similar 7 values for three monitors: 
7 = 0.5 for NTT , and 7 = 0.4 for IIJ and Tinet monitors. The power spectrum density, S(f), also decays as a power-law 
with frequency, S(f) ~ where f3 = 0.6 for all monitors (Fig.pb). We note that the obtained values of correlation exponents 
approximately conform with expected relations, 7 = 1 — /3, a = and 7 = 2(1 — a) |QT1 [2914^0 . 

The appearance of long-range correlations in BGP update time series indicates that at a given time the state of a particular 
BGP router is determined by its previous states. Consequently, long-range correlations may imply the presence of memory 
effects in the inter-domain Internet routing. To probe for the latter we ask, what is a typical time interval r separating two large 
events. Formally, we define a return interval r(q) as a time separation between two consecutive events z(t\) and 2(^2), such 
that z(ti) > q and zfo) > q (see Fig. 5a). The evidence of memory in BGP update time series is seen in Fig. 5b, which 
displays typical sequence of 500 consecutive return intervals for the NTT monitor. The original return interval data (shown in 
magenta) is characterized by patches of extreme return intervals, while there is no such patches in the shuffled data (shown in 
black) obtained by randomizing the time-order of the original series of BGP updates. 

To further explore memory effects we analyze the distribution of return intervals P q (r) for the NTT monitor (Fig. 5c). We 
note that P q (r) decays slower than the Poisson distribution, which is expected for uncorrelated data (Fig. 5d). As q increases, the 
decay of P q (r) becomes slower and the average return interval r(q) increases implying that the larger events become increasingly 
rare (see the inset of Fig. 5c)). We also note that, independent of q , all the distributions P q (r ), upon proper rescaling, collapse 
to a single master curve: 


P,(r) = \S (J) . (5) 

where f(x) does not depend on the threshold value (see Fig. 5e and Fig. [8]). The resulting master curve f(x) fits a stretched 
exponential exp{—x~ 7 ) with exponent 7 = 0.5 (p = 0.08, see Appendix |d|), which approximately matches the observed 
autocorrelation exponent 7 = 0.5 (32). We note that the observed scaling of P q (r) holds not only for NTT but also for the 
other three analyzed monitors (see Appendix [C] and Fig. [7]). 
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FIG. 5: Return interval statistics of BGP updates, a, Schematic illustration of the BGP update return intervals. Shown are the intervals n 
and 72 calculated for threshold q — 1 and q — 2 respectively, b, Typical sequence of 500 BGP update return intervals for NTT , where q — 4, 
calculated for (magenta) original and (black) shuffled data, c, The distribution function P q (r) of BGP update return intervals of the NTT , 
calculated for different values of q. The inset depicts the average return interval r as a function of threshold q. d, P q (r) for BGP update return 
intervals of the NTT monitor calculated for q — 1. Original data is shown with red while shuffled data is shown with black, e, Scaled plots 
of the BGP return intervals for the NTT monitor, f, The mean conditional return interval f as a function of preceding return interval To for 
the NTT monitor. Both f and to are normalized with the mean return interval (r). For BGP updates without memory we expect f (to) = 1, 
as supported by the open symbols obtained for shuffled return interval data. 
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Finally, to test memory effects directly we measure the average return interval f following immediately after return intervals 
of fixed duration To. Figures 5f and Fig. [8] depict f as a function of To for three possible values of threshold q (filled symbols). 
We observe that f increases as a function of To, indicating that on the average longer (shorter) return intervals tend to follow 
longer (shorter) intervals. In contrast, f is independent of preceding return interval To for randomized data (open symbols in 
Fig. 5f and Fig. [8]). 


III. DISCUSSION 

In this work, we investigated the statistical properties of BGP updates. Complementing previous studies, we confirmed 
that the rate of BGP updates is highly volatile, with extreme events at times exceeding the average rates by up to 4 orders of 
magnitude. We established that the distribution in the number of BGP updates received by a BGP monitor in a given time 
window is characterized by a power-law tail with exponent fi = 2.5. We also found (using three independent methods) that the 
BGP update time series exhibit long-range correlations. The analysis of the return interval data revealed the universal scaling 
in the distribution of return intervals P q {r). We also found memory effects in the return interval data. Small (or large) return 
intervals separating BGP update events are more likely to be followed by small (or large) intervals. 

The observed volatility and correlation properties of the BGP update dynamics place interdomain Internet routing into the 
same class of phenomena as earthquakes [ 19., 20 ], climate ED, stock markets and languages mm. Unlike these 

systems, however, the Internet routing is a fully engineered system. The observed dynamical similarities between these stochastic 
systems imply that the key mechanisms underlying Internet routing are in a certain way similar to the mechanisms governing 
the dynamics of stock markets or seismic movements in the Earth crust. 

As with stock market price dynamics, one would wish to be able to predict BGP dynamics, or at least extreme events in it. To 
this end, one could benefit from the return interval scaling. The established scaling of P q (r) may allow one to approximate the 
statistics of return intervals for large events (characterized by large q values) using the much richer statistics of return intervals 
of smaller events. 

The observed long-range correlations and memory effects indicate that the communication patterns between BGP routers are 
an outcome of an interplay between certain semi-deterministic processes. Such processes are well known at the low level of the 
operation of an individual BGP router (e.g. BGP route selection process). Yet this knowledge is as helpful as the knowledge 
about the dynamical properties of an individual molecule in a gas—when studying the properties of this gas (or the Internet in 
our case), some molecular details do matter, but most details are irrelevant. 

Therefore the identification of a proper level of abstraction in modeling the dynamics of BGP routing is an important problem 
for understanding Internet dynamics. The statistical analysis of the BGP update time series that we have conducted here should 
serve as a basis for validation of existing models and for developing better ones. 


IV. MATERIALS AND METHODS 


Intraday and Intraweek Patterns 

Consider series Z(t ), where Z is the number of events taking place at time t , and t is specified as UNIX timestamps. We 
first define functions td ay {t ) and t wee k(t ) which map Unix timestamps t to respectively specific time of the day or specific 
time of the week ( td ay G [0: 00, 24: 00], t wee j c G [Sunday, 0: 00, Saturday , 24: 00]). Both td ay and t wee k are calculated 
corresponding to the GMT time zone. 

The intra-day pattern, Zd, is then defined as the number of events taking place at a specific time of the day, td ay , averaged 
throughout the observation period: 


1 v d 

Zd{tda y ) = % (' td ay ), ( 6 ) 

where Nd is the total number of days in the observation period, and Z l {td ay ) is the number of events at day i at td ay • The 
intra-week pattern Z w (t wee k ) is defined in a similar way after first normalizing the time series with the intra-day pattern. 
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Here N w is the number of weeks in the observational period and Z l (t wee k) is the normalized number of events at week i at time 
of the week t wee k . 

The standard deviations of the intraday and intraweek patterns are defined as 
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Detrended Fluctuation Analysis 

Detrended Fluctuation Analysis is a method designed to study correlations in time series l27l . Here we employ the linear 
version of the DFA, defined as follows. We first calculate the cumulative BGP update time series: 

y{t)=Y j {z{t')-z), (11) 

t'=ti 

where t{ is the initial time value in the series, z(t) is the original time series and z is its average value. The cumulative time series 
y(t) is then divided into boxes of equal size A. In each box, a least squares linear fit to the y(t) data is performed, representing 
the trend in that box. That is, for each box A we determine linear approximation for the corresponding piece of the time series: 

2/a (t) = m A t + 6 a , (12) 

where m A and b A are the slope and the intercept of the straight line. Next we detrend the integrated time series, y(t), by 
subtracting the local trend, y A (t), in each box. The root-mean-square fluctuation of this integrated and detrended time series is 
calculated: 


F( A) = 



tf 


t=ti 


(13) 


where N is the total number of points in the original time series, U and tf are respectively the initial and final time values in the 
series. 

This fluctuation measurement process is repeated at a range of different box sizes A. The fluctuations typically exhibit a 
power law scaling as a function of box size: 


F( A) - A a , (14) 

depending on the observed exponent a one can distinguish anti-correlated fluctuations (a < 1/2), uncorrelated fluctuations 
(a = 1/2), and correlated fluctuations (a > 1/2). 

Data Randomization 

To assess the significance of correlations and memory effects in the BGP update time series we compare original results to 
those obtained for randomized (shuffled) datasets. In all experiments the randomization is performed at the most basic level: 
for a given time series Z(t) we obtain its randomized (shuffled) counterpart by randomly rearranging time stamps attributed to 
each element in the series. Shuffled data is subsequently normalized and binned using the same procedures as those applied to 
original data. 
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Appendix A: The collection and pre-processing of the BGP update data 

Our analysis is based on BGP update traces collected by the Route Views project M- Routeviews collects the BGP update 
data from select ASes. BGP routers in these ASes are referred to as monitors. Monitors send BGP updates to the Routeviews 
collector every time there is a routing change. BGP updates are recorded by the Routeviews with a granularity of one second. 
We focus on update traces from monitors at large transit networks in the core of the Internet. Specifically, we analyze the BGP 
update time series from four monitors: AT&T, NTT , II J, and Tinet. 

AT&T (American Telephone & Telegraph) is an American multinational telecommunications corporation, headquartered in 
Dallas, TX. AT&T is one of the largest providers of telephone services in the United States. AT&T also provides broad¬ 
band subscription to television services. NTT (Nippon Telegraph and Telephone) is a Japanese telecommunications company 
headquartered in Tokyo, Japan and is one of the largest telecommunications companies in the world in terms of revenue. IIJ 
(Internet Initiative Japan) is the first Japan’s Internet provider, which is headquartered in Tokyo, Japan. IIJ is currently known 
as total solutions provider, offering network services, value-added outsourcing services, and cloud computing, WAN services 
and systems integration services. Tinet (The Tiscali International Network) is an Italian Internet service provider headquartered 
in Cagliari, Italy. AT&T , NTT , and Tinet are present worldwide. IIJ has very strong presence in Japan and also operates in 
the US, UK, Germany, China, Hong Kong, Indonesia, Singapore, and Thailand. 

AT&T, NTT, II J, and Tinet monitors correspond to the largest Internet Service Providers and belong to the Default Free 
Zone. In other words, they have a route to every destination prefix on the Internet. Thus, corresponding BGP update traffic 
is a reflection of BGP dynamics taking place in the core of the Internet, where the BGP update volatility is believed to reach 
maximum rates. Our data spans 8.5 years from mid-2003 through the end of 2011. 

Data pre-processing 

• During the period of observation some of the IP addresses of the monitors changed. We identified these changes and 
concatenated corresponding update time series. 

• In the case BGP session between the monitor and the Routeviews collector is broken and re-established, the monitor re¬ 
announces all its known paths to the collector producing a burst in the number of BGP updates. These local artifacts of 
the RouteViews measurement infrastructure are known as session resets. Session resets do not represent genuine BGP 
routing dynamics. We identified and removed BGP updates corresponding to session resets using the method developed 
in a course of our previous work 1 331 . 

• In addition to session resets we have identified and removed BGP updates corresponding to periods of non-stationarity in 
the dynamics of the monitors, caused by misconfigurations and monitor-specific events. We refer the reader to Ref. f34l 
for a detailed discussion of these non-stationary periods and the methods used for their identification. 

The BGP update datasets used in this work are publicly available at the Figshare repository: http : / /f igshare . com/ 
articles/Correlation_in_global_routing_dynamics/1549778, 


Appendix B: Correlations in BGP Updates Time Series 

In this section we discuss methods we use to characterize correlations in the BGP update times series: Auto-Correlation 
Functions (ACF) and Power Spectrum (PS). Linear Detrended Fluctuation Analysis (DFA1) is discussed in the Methods section 
of the main text. 


1. Auto-Correlation Function 

Correlations in the discrete time series z(t) can be quantified by the ACF [29], which is defined as 
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FIG. 6: Correlations in the BGP update times series, a, The autocorrelation function, ACF and b, The Power Spectrum S(f) 


where N is the total number of elements in the time series, /x and a are respectively the mean and the standard deviation of the 
time series: 
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The values of the ACF range from +1 (very high positive correlation) —1 (very high negative correlation). In the case there is 
no correlations, the ACF(r) ~ 0 . In the case z(t) is correlated for up to to time steps, corresponding ACF(r ) is positive for 
r < to- Among correlated time series one often distinguishes short-range and long-range correlations. In the case of short-range 
correlations, ACF exhibits a fast, typically exponential, decay to zero: 

ACF(t) ~ e" r/c , (B4) 

where £ is the effective correlation length. In the case long-range correlations are present, the ACF decreases to zero much 
slower, typically as a power-law: 


ACF(t) ~ r" 7 . (B5) 

Figure [ 6 ^ depicts the ACF measured for the BGP update time series. All four monitors exhibit power-law scaling of the ACF 
with 7 = 0.5 of the NTT monitor and 7 = 0.4 for other monitors. Noise in the ACF plots is caused by irregular fluctuations in 
the time series as well as possible non-stationarities. 


2. Power-Spectrum 

Power spectrum is defined as the Fourier Transform of the ACF (291: 

/ oo 

ACF(T)e~ 2 * ifT dr 

-OO 

In the case of long-range correlations characterized by ACF ~ r -7 , S(f) also decays as a power-law: 

su) - r 0 , 


where 


(B6) 


(B7) 


/3 = 1 — a 


(B8) 
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FIG. 7: (Left column) The distribution of return intervals, P q (r), for BGP updates of a NTT, c AT&T, e Tinet, and g IIJ monitors, the distributions 
are calculated for different values of threshold q. (Right column) Rescaled plots of the BGP return intervals for of b NTT, d AT&T, f Tinet, and h IIJ 
monitors. 
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FIG. 8: The mean conditional interval f(ro) divided by r as a function of 7 1 for a NTT, b AT&T, c Tinet, and d IIJ monitors. In time series without 
memory, f(ro) = 1, supported by the open symbols that show the shuffled return interval data. 


Even though the Power-Spectrum is fully derived from the ACF, the use of the former in series analysis often help s to decrease 
the noise. As seen from Fig.[b]t>, S(f ) ~ f~P for all four monitors with /? « 0.6, which is consistent with Eq. (B8). 

Figures^ and[6]b complement the results of DFA1, which we report in the main text. 

We note the long-range correlation exponents measured with the three methods (ACF ( 7 ), DFAS1 (a) and PSD (/3) conform 
with expected relations fl7ll29ll30l : 


7 = 

2 — 2a, 

(B9) 


P+l 


a = 

2 ’ 

(BIO) 

7 = 

1-/3 

(Bll) 


Appendix C: Return Intervals and Memory in BGP updates 

Figure^, c, e, g displays the distribution of return intervals, P q (r) calculated for all four monitors. Return intervals calculated 
for different values of the threshold q. Upon rescaling P q (r) follow the same master-curve f(x): 

P q (r) = \f (\) , (d) 

where f(x) does not depend on the threshold value and fits a stretched exponential exp (—x 7 ), where 7 « 0.5 (Fig. 

Appendix [P| for details on data fitting. 

We also note that all four monitors exhibit memory effects. As seen, from Fig. [8] large (small) return intervals are likely to be 
followed by large (small) return intervals. 










13 



0 0.02 0.04 0.06 0.08 0.1 

KS 



0 0.02 0.04 0.06 0.08 


KS 


FIG. 9: KS goodness of fit tests for a, the distribution of number of BGP updates, P(z) for the NTT monitor, and b, the distribution of return intervals P q (r) 
for the NTT monitor. 


Appendix D: Statistical Tests of Data Fitting 

In this section we present the results on data fitting for the distribution of number of updates, P(z), and the distribution of 
return intervals, P q (r), calculated for the NTT monitor. For P(z), we fit time series with aggregation window size of 1 min, 
and for P q (r) we fit return interval time series with q G (1, 2,4,8,16, 32). Note that we pick these values since they yield a 
reasonable sample size for performing the fitting. 

We tested the goodness of fit using the Kolmogorov-Smirnov (KS) goodness of fit test (35]. 

The KS test can be summarized as follows. Given two cumulative distribution functions (CDF), Fi(x) and F 2 (x) one defines 
the KS statistic as 


KS(F u F 2 ) = supj F 1 (x) - F 2 (x) |. (Dl) 

The KS statistic can be viewed as a separation between the two distribution. The smaller the KS the more similar the two 
distributions are. One starts with the calculation of the KS statistic for the empirical Cumulative distribution function (CDF) 
E(pc) and the CDF of its best-fit, BF(x ), which is referred to as KS eb : 

KS eb = KS(E(x),BF(x)) (D2) 

Then, a large number of data samples N s is generated using BF(x ), each data sample containing a large number of values. For 
each data sample i one calculates CDF Ei (x) and corresponding KS statistic 

KSi = KS(Ei(x ), BF{x)) (D3) 

The KS test compares the empirical KS statistic KS eb with the set of synthetic values KSi. Goodness of fit is quantified by the 
p- value by integrating the distribution of synthetic KS values P(KS): 

POO 

p = P(KS)dKS (D4) 

J KSeb 

p- values, therefore can be interpreted as the probability that the observed data was the result of its best fit. Large p- value indicates 
that the empirical distribution matches its best fit as good as synthetic data generated from the fit itself, whereas a small p- value 
(typically p < 0.01) suggests that the empirical distribution can not be the result of its best fit. 

The distribution of the number of BGP updates. We employ the maximum-likelihood method proposed by Clauset et 
al (36l to estimate fi and of the P(z) given by 


P(z) - (D5) 

The maximum likelihood estimate yields a = 2.5 and « 160. To assess the goodness of fit we get KS eb « 0.014 and 
generate N s = 10 3 synthetic data samples each consisting of 10 3 values (Fig. [9^), the resulting p -value is p = 0.992. 
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The distribution of the return intervals obtained for the NTT monitor. We argue that the scaled distribution of the number 
of BGP updates return intervals decays as a stretched exponential distribution in the form P q (r) ~ e~ xl . In the following we 
elaborate on confirming that our empirical data is statistically consistent with the best-fit. To find the best fitting distribution, 
we experimented with fitting the data using the maximum likelihood estimation to various distributions, exponential, power-law, 
stretched exponential. The latter gives the best-fit with exponent 7 « 0.5. 

To confirm that the empirical data is statistically consistent with the best-fit, we perform the KS goodness of fit test. We 
generated N s = 10 3 samples of 10 3 values each from the stretch exponential fit. We then compute the P(KS ) synthetic 
samples and compared it with KS e b « 0.0395 (Fig.[9]b). The obtained p- value is p = 0.08. 
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